Audio-concept features and hidden Markov models for multimedia event detection

نویسندگان

  • Benjamin Elizalde
  • Mirco Ravanelli
  • Karl Ni
  • Damian Borth
  • Gerald Friedland
چکیده

Multimedia event detection (MED) on user-generated content is the task of finding an event, e.g., a Flash mob or Attempting a bike trick, using its content characteristics. Recent research has focused on approaches that use semantically defined “concepts” trained with annotated audio clips. Using audio concepts allows us to show semantic evidence of their relationship to events, by looking at the probability distribution of the audio concepts per event. However, while the concept-based approach has been useful in image detection, audio concepts have generally not surpassed the performance of low-level audio features like Mel Frequency Cepstral Coefficients (MFCCs) in addressing the unstructured acoustic composition of video events. Such audio-concept based systems could benefit from temporal information, due to one of the intrinsic characteristics of audio: it occurs across a time interval. This paper presents a multimedia event detection system that uses audio concepts; it exploits the temporal correlation of audio characteristics for each particular event at two levels. The first level involves analyzing the shortand long-term surrounding context information for the audio concepts, through an implementation of a Hierarchical Deep Neural Network (H-DNN), to determine engineered audio-concept features. At the second level, we use Hidden Markov Models (HMMs) to describe the continuous and non-stationary characteristics of the audio signal throughout the video. Experiments using the TRECVID MED 2013 corpus show that an HMM system based on audio-concept features can perform competitively when compared with an MFCC-based system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audio self organized units for high-level event detection

High-level multimedia event detection aims to identify videos containing a target event. Recent approaches leveraging audio information for this task fall into two broad categories. The first corresponds to holistic bag-of-words approaches based on frame-level descriptors. These are effective for classification, but hard for humans to interpret. The second corresponds to approaches that build a...

متن کامل

Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, name...

متن کامل

Indexing multimedia documents with acoustic concept recognition lattices

The amount of multimedia data is increasing every day and there is a growing demand for high-accuracy multimedia retrieval systems that go beyond retrieving simple events (e.g., detecting a sport video), to more specific and hard-to-detect events (e.g., a point in a tennis match). To retrieve these complex events, audio content features play an important role since they provide complementary in...

متن کامل

Event-Coupled Hidden Markov Models

Inferences from time-series data can be greatly enhanced by taking into account multiple modalities. In some cases, such as audio of speech and the corresponding video of lip gestures, the different time-series are tightly coupled. We are interested in loosely-coupled time series where only the onset of events are coupled in time. We present an extension of the forward-backward algorithm that c...

متن کامل

Segmentation and Indexation of Broadcast News

This paper describes a topic segmentation and indexation system for broadcast news that is integrated in an alert system for selective dissemination of multimedia information. The goal of this work is to enhance the retrieval and navigation through specific spoken audio segments that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on simple heu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014